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Structure of the C-terminal domain of 
feline coronavirus 


Coronaviruses are a family of positive-stranded RNA viruses 
that includes important pathogens of humans and other 
animals. The large coronavirus genome (26-31 kb) encodes 
15-16 nonstructural proteins (nsps) that are derived from two 
replicase polyproteins by autoproteolytic processing. The nsps 
assemble into the viral replication-transcription complex and 
nsp3, nsp4 and nsp6 are believed to anchor this enzyme 
complex to modified intracellular membranes. The largest part 
of the coronavirus nsp4 subunit is hydrophobic and is 
predicted to be embedded in the membranes. In this report, 
a conserved C-terminal domain (~100 amino-acid residues) 
has been delineated that is predicted to face the cytoplasm and 
has been isolated as a soluble domain using library-based 
construct screening. A prototypical crystal structure at 2.8 A 
resolution was obtained using nsp4 from feline coronavirus. 
Unmodified and SeMet-substituted proteins were crystallized 
under similar conditions, resulting in tetragonal crystals that 
belonged to space group _P4 3 . The phase problem was initially 
solved by single isomorphous replacement with anomalous 
scattering (SIRAS), followed by molecular replacement using 
a SIRAS-derived composite model. The structure consists of 
a single domain with a predominantly a-helical content 
displaying a unique fold that could be engaged in protein- 
protein interactions. 


1. Introduction 

The Coronaviridae family, which is comprised of the genera 
Coronavirus and Torovirus, and the more distantly related 
Arteriviridae and Roniviridae families together form the order 
Nidovirales (Gorbalenya et al., 2006). Coronaviruses are 
positive-stranded RNA viruses that are frequently associated 
with enteric or respiratory diseases in humans, livestock and 
companion animals (Dye & Siddell, 2005). At present, they 
are formally classified into three genetic groups (1-3), with the 
first two groups further divided into two subgroups (la/b and 
2a/b; Gorbalenya et al., 2004; Lai & Holmes, 2001), but as our 
understanding of natural coronavirus diversity progresses 
novel subgroups continue to be recognized (Woo et al., 2009). 
Viruses that belong to different subgroups have diverged 
profoundly. A fraction of their proteins are subgroup-specific 
and the amino-acid sequences of their most conserved 
proteins may differ by as much as 50%. The best-known 
member of this family, severe acute respiratory syndrome 
coronavirus (SARS-CoV), belongs to subgroup 2b, whereas 
feline coronavirus (FCoV), characterized in this study, belongs 
to subgroup la (Gorbalenya et al., 2004). 
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Feline infectious peritonitis virus (FIPV) is a pathogenic 
FCoV variant that emerged by mutation of the relatively 
benign enteric FCoV (Poland et al, 1996; Vennema et al., 
1998) and causes a fatal immune-mediated disease in cats 
(Pedersen, 1995). Since the variations are minor and despite 
the fact that we are working with a construct derived from an 
FIPV strain, we will henceforth use FCoV as an abbreviation 
for the virus. The FCoV genome (strain FIPV WSU-79/1146) 
consists of 29 125 nucleotides and contains six open reading 
frames (ORFs; Dye & Siddell, 2005). The first two ORFs, 
namely ORFla and ORFlb, comprising the 5'-most gene (i.e. 
gene 1), encode two large replicase polyproteins, ppla and 
pplab. Proteolytic cleavage of these polyproteins by virus- 
encoded proteinases, i.e. the 3C-like main proteinase (M pro in 
nsp5) and the papain-like accessory proteinases (PL pro 1 and 2 
in nsp3), are predicted to give rise to a total of 16 mature 
nonstructural proteins (nsps; Ziebuhr et al., 2000; Dye & 
Siddell, 2005). In addition to the proteases mentioned above, 
associated enzymatic activities have been identified for several 
coronavirus nsps, including deubiquitinating and adenosine 
diphosphate-ribose-l'-phosphatase (ADRP) functions (nsp3), 
RNA-dependent RNA polymerases with low (nsp8) and high 
(nspl2) processivity, helicase (nspl3), RNA exonuclease and 
N7-methyltransferase (nspl4), RNA endoribonuclease (nspl5) 
and 2'-<9-met hy1 1 ransferase (nspl6) (Anand et al ., 2003; 
Bhardwaj et al ., 2004; Chen et al., 2009; Cheng et al., 2005; 
Gorbalenya et al., 1989; Harcourt et al., 2004; Imbert et al., 
2008; Ivanov, Hertzig et al., 2004; Ivanov, Thiel et al., 2004; 
Ratia et al., 2006; Seybert et al., 2000; Snijder et al, 2003). 
Tertiary structures, solved using X-ray or/and NMR analyses, 
have been reported for a substantial number of nsps from at 
least one coronavirus, typically SARS-CoV (reviewed in 
Bartlam et al., 2005; Mesters et al., 2006). These structures 
represent a variety of separate domains, entire proteins and 
even multiprotein complexes. Despite these remarkable 
advances, many domains, including those residing in nsp4, 
remain poorly characterized. 

Nsp4 is an approximately 500-amino-acid replicase subunit 
that is released by the combined activity of the nsp3 and nsp5 
proteases. It is predicted to be one of the three membrane- 
spanning proteins (the others are nsp3 and nsp6) among 
coronavirus nsps and bioinformatic analyses consistently 
predict four transmembrane domains in nsp4 (Clementz et al., 
2008; Oostra et al., 2007). An N-terminal transmembrane 
region (amino acids 1-30) is presumably followed by a large 
lumenal domain (amino acids 30-280), three closely spaced 
additional transmembrane regions (amino acids 280-400) and 
finally a C-terminal domain of about 100 residues that is 
exposed at the cytoplasmic face of the membrane. Corona¬ 
virus infection induces the extensive reorganization of endo¬ 
plasmic reticulum membranes into a reticulovesicular network 
(Knoops et al., 2008) that includes many unusual double¬ 
membrane vesicles (Gosert et al., 2002; Harcourt et al., 2004; 
Shi et al., 1999; Snijder et a!., 2006; Stertz et al., 2007). It is 
currently believed that nsp4 functions in anchoring the viral 
replication-transcription complex (RTC) to these modified 
membranes and independent genetic studies have demon- 
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strated its importance for replication (Clementz et al., 2008; 
Sparks et al., 2007). 

In this paper, we present the first X-ray structure of the 
C-terminal domain of the FCoV nsp4. Together with structural 
data, a family-wide comparative sequence analysis of the nsp4 
C-terminal domain was performed in order to identify resi¬ 
dues/regions that might be important for function rather than 
for structural integrity. 

2. Experimental procedures 

2.1. Library-based construct screening 

The sequence encoding the FCoV nsp4 (residues 2337-2826 
of the polyprotein ppla from strain FIPV WSU-79/1146; 
Genebank/RefSeq accession No. NC_007025.1) was RT-PCR 
amplified from viral RNA and cloned into the pMM8 vector. 
pMM8 is a modified pET-43 (Novagen) bacterial expression 
vector containing restriction sites suitable for exonuclease- 
based construct-library generation (Cornvik et al., 2006) and a 
Gateway cassette for recombination cloning inserted down¬ 
stream of the His-tag coding sequence. An N-terminally 
deleted construct library was generated using an exonuclease 
strategy and screened for a soluble construct using the colony- 
filtration blot (Cornvik et al., 2005, 2006). A soluble and well 
expressing construct containing residues 2731-2826 (here 
called the nsp4ct domain) was chosen for scale-up expression 
and purification. This construct has 14 additional N-terminal 
residues, including a noncleavable His 6 tag. 

2.2. Expression 

The expression of soluble nsp4ct was performed in 
Escherichia coli strain BL21 (DE3) (Novagen). Cultures were 
grown at 310 K in LB medium containing 50 pg ml -1 ampi- 
cillin until the OD 60 o reached 0.8. Protein synthesis was 
induced by the addition of 1 m M isopropyl /1-D-l -thio- 
galactopyranoside (IPTG) and the culture was grown to 
stationary phase overnight at 288 K. Cells were harvested by 
centrifugation at 4000g (30 min, 277 K) and frozen at 253 K. 
Selenomethionine-substituted nsp4ct was expressed in the 
non-methionine auxotrophic E. coli strain BL21 (DE3) 
(Novagen). Bacteria were grown in minimal medium at 310 K 
until the OD 600 reached 0.8. Feedback-inhibition amino-acid 
mix (Lys, Thr, Phe, Leu, lie, Val and SeMet) was added and 
after 15 min cells were induced with 1 m M IPTG. The culture 
was left shaking overnight at 288 K and the cells were subse¬ 
quently harvested by centrifugation at 4000g (30 min, 277 K). 
Cell pellets were frozen at 253 K. 

2.3. Purification 

Both the native and the SeMet-substituted nsp4ct proteins 
were purified following the same protocol. A pellet from 1 1 
cell culture was resuspended in 20 ml buffer A (10 m M CHES 
pH 9.1 and 300 m M NaCl, plus 2 m M /J-mercaptoethanol in 
the case of SeMet-substituted nsp4ct). The cells were soni¬ 
cated and the protein was purified from the soluble cellular 
fraction by Ni-NTA affinity chromatography and eluted with 
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buffer A containing 500 m M imidazole. The eluate was buffer- 
exchanged into buffer A using PD10 columns (GE Healthcare 
Life Sciences). nsp4ct was subsequently concentrated and 
applied onto a Superdex 75 (16/60) gel-filtration column (GE 
Healthcare Life Sciences) pre-equilibrated with buffer A. The 
protein was concentrated to 10 mg ml -1 and its purity was 
examined by SDS-PAGE. 

2.4. Crystallization 

Initial crystallization trials were carried out using the 
sitting-drop vapour-diffusion method in 96-well plates 
(Greiner) at 292 K at the EMBL Hamburg High-throughput 
Crystallization Facility (Mueller-Dieckmann, 2006). Crystals 
were obtained under various conditions. Further optimization 
of these conditions was performed manually in 24-well plates 
(Qiagen) using the hanging-drop vapour-diffusion method at 
292 K. Crystals were obtained at a protein concentration of 
7 mg mP 1 in 0.22 M ammonium sulfate and 25 %(wlv) PEG 
5000. 

2.5. Data collection and processing 

The crystals were cryoprotected in a solution consisting of 
0.22 M ammonium sulfate, 25%(w/v) PEG 5000 and 15%(v/v) 
ethylene glycol prior to data collection. Three data sets were 
collected: two single-wavelength native data sets (data sets 1 
and 2) and a single-wave length anomalous diffraction (SAD) 
data set (at peak wavelength; data set 3). Data set 1 was 
collected from a single crystal at 100 K on the European 
Synchrotron Radiation Facility (ESRF) beamline 1D23-2 
using a MAR 225 CCD detector. The oscillation range was 1°, 
with a crystal-to-detector distance of 346.2 mm. 90 images 
were collected to a maximum resolution of 3.1 A. Data set 2 
was collected from a single crystal at 100 K on the EMBL 
beamline X12 at DESY using a MAR 225 detector. The 
crystal-to-detector distance was 300 mm, with an oscillation 
range of 0.25°. A total of 670 images were collected to a 
maximum resolution of 2.76 A. Data set 3 was also collected 
from a single crystal on beamline X12 (EMBL Hamburg). The 
crystal-to-detector distance was 280 mm and the oscillation 
range was 1°. 200 images at the selenium absorption edge were 
collected to a maximum resolution of 3.3 A. 

In all three cases the recorded images were processed with 
XDS (Kabsch, 1988) and the reflection intensities were 
processed with COMBAT and scaled with SC ALA (Evans, 
1993) from the CCP4 program suite (Collaborative Compu¬ 
tational Project, Number 4, 1994). Data-collection statistics 
are shown in Table 1. 

2.6. Structure determination 

The structure was solved using the SIRAS protocol of the 
Auto-Rickshaw automated crystal structure-determination 
platform (Panjikar et al. , 2005). F A values were calculated 
using the program SHELXC (Sheldrick, 2008). Based on an 
initial analysis of the data, the maximum resolution for 
substructure determination and initial phase calculation was 
set to 3.8 A. 20 selenium positions were found using the 
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program SHELXD (Sheldrick, 2008). The correct hand of the 
substructure was determined using the programs ABS (Hao, 
2004) and SHELXE (Sheldrick, 2008). The occupancy of all 
substructure atoms was refined using the program BP3 (Pannu 
et al., 2003; Pannu & Read, 2004). The initial phases were 
improved using density modification, noncrystallographic 
symmetry (NCS) averaging and phase extension using the 
program RESOLVE (Terwilliger, 2000). A partial a-helical 
model was produced using the program HELICAP (Morris et 
al., 2004). The partial model contained 119 of the total of 440 
residues expected for four molecules. The initial phases were 
improved by phase combination of experimental and model 
phases using the program SIGMAA (Read, 1986). The density 
modification and fourfold NCS averaging were repeated again 
as described above. The resultant phases were used to 
continue model building using the program ARPIwARP 
(Perrakis et al., 1999), resulting in the placement of 242 resi¬ 
dues. The partial models generated in the intermediate steps 
of ARPIwARP were then used to assemble an almost 
complete dimer using the graphics program Coot (Emsley & 
Cowtan, 2004). This dimer was then used to find the second 
dimer in the electron density using phased molecular-repla¬ 
cement techniques as implemented in MOLREP (Vagin & 
Teplyakov, 1997). 2 F a — F c and F a — F c electron-density maps 
calculated at this stage showed additional electron density 
indicating the presence of a fifth molecule in the asymmetric 
unit. The phased molecular replacement was repeated again to 
place the fifth molecule in the electron-density map. The 
resultant model was then used for restrained refinement in 
REFMAC5 (Murshudov et al, 1997), including use of the 
translation, libration and screw method (TLS; Schomaker & 
Trueblood, 1968) for describing group motions. 

The structure was manually modified, followed by cycles of 
refinement, using the program Coot. The progress of the 
refinement was monitored by means of the free R factor 
(Briinger, 1992). Water molecules were included where clear 
peaks were present in both the 2F a — F c and F a — F c maps and 
where appropriate hydrogen bonds could be made to 
surrounding residues or to other water molecules. The 
stereochemistry of the model was evaluated with the program 
MOLPROBITY (Davis et al., 2007). 

Interfaces between molecules were analyzed with the PISA 
server (Krissinel & Henrick, 2007). Interactions between 
molecules were initially evaluated using the CCP4 program 
CONTACT with a maximum contact distance of 3.6 A. 

3. Results and discussion 
3.1. Structure determination 

The recombinant His 6 -tagged FCoV nsp4ct domain (resi¬ 
dues 2731-2826 of ppla and residues 395-490 of nsp4; here 
renumbered as 1-96) was expressed in E. coli. The protein was 
also expressed with the substitution of methionine by seleno¬ 
methionine (SeMet). The incorporation of SeMet was verified 
by matrix-assisted laser desorption ionization (MALDI) mass 
spectrometry. Both native and SeMet-substituted proteins 
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Table 1 

Data collection. 

Values in parentheses are for the last resolution shell. 


Data set 

1 (native) 

2 (native) 

3 (SeMet) 

Crystallization conditions 

0.22 M (NH 4 ) 2 S0 4 , 

0.22 M (NH 4 ) 2 S0 4 , 

0.22 M (NH 4 ) 2 S0 4 , 


25%(w/v) PEG 5000 

25%(wlv) PEG 5000 

25 %(w/v) PEG 5000 

X-ray source 

ID23-2 

X12 

X12 

Space group 

E4 3 

P4 3 

P4 3 

Unit-cell parameters (A) 

a = b = 128.0, c = 43.7 

a = b = 127.5, c = 42.7 

a = b = 128.7, c = 42.3 

Wavelength (A) 

0.872 

0.978 

0.9777 

Resolution range (A) 

50.0-3.1 

20.0-2.8 

50.0-3.3 

Mosaicity 

0.14 

0.16 

0.2 

Mean I/cr(I) 

14.1 (2.9) 

23.2 (3.6) 

11.9 (4.0) 

Rfac (linear) (%) 

8.5 (44.7) 

5.3 (44.1) 

9.7 (32.9) 

Redundancy 

3.6 

6.8 

3.8 

^meas (%) 

10.2 (56.2) 

5.7 (48.0) 

11.1 (37.8) 

No. of observations 

48030 

109977 

76895/ 

No. of unique reflections 

13150 

18178 

20164/ 

Completeness (%) 

99.3 (97.7) 

99.1 (96.6) 

99.2 (96.1) 


f Friedel pairs were not merged. 


Table 2 

Refinement. 


Space group 

E4 3 

Resolution range (A) 

19.9-2.8 

No. of reflections (working/free) 

17247/927 

No. of protein residues 

A, 96; B, 93; C, 92; D, 91; E, 84 

No. of waters 

40 

No. of sulfate molecules 

2 

^work^free (%) 

24.0/29.9 

Average B (A 2 ) 

68.7 

R.m.s. deviation from ideal values 

Bond lengths (A) 

0.012 

Bond angles (°) 

1.6 


were crystallized from conditions 
containing ammonium sulfate and PEG 
5000. The crystals belonged to space 
group P4 3 , with unit-cell parameters 
a = b = 127.5, c = 42.8 A (data set 2 in 
Table 1). There are five molecules 
(chains A-E) in the asymmetric unit, 
which corresponds to a 64% solvent 
content (Matthews, 1968). The structure 
was refined at 2.8 A resolution to a final 
R value of 24.0% ( R bee = 29.9%). The 
final model contains 96 residues in 
molecule A (residues 0-95), 93 residues 
in molecule B (residues 0-49 and 53- 
95), 92 residues in molecule C (residues 
0-91), 91 residues in molecule D (resi¬ 
dues 1-91) and 84 residues in molecule 
E (residues 0-49 and 56-89). 88.0% of 
the residues are located in the preferred 
regions of the Ramachandran diagram and 10.9% are in 
allowed regions. Residues Met55 (chains A and B ), Glu57 
(chains C and D ) and Ala58 (chains A-E) are Ramachandran 
outliers. Glu57 and Ala58 are located in the N-terminus of 
helix a3. The geometry in this region may be influenced by 
hydrogen bonding between Glu57 O e and Arg61 N' J . The 
refined structure contains two sulfate ions and 40 solvent 
molecules. A detailed summary of the data-collection and 
structure-refinement statistics is given in Tables 1 and 2. 

3.2. Overall structure 



Overall structure of nsp4ct domain (molecule A is shown). a-Helices are 
shown in purple, /1-strands are shown in yellow, loops and termini are 
shown in light blue and regions forming /1-strands present only in the 
dimer interface are depicted in red. 


The FCoV nsp4ct structure contains six short /1-strands 
/11-/16 and four a-helices al-oi4 (Fig. 1). Strands /II and /12 
and strands /13 and /15 form small two-stranded antiparallel 
sheets. Strands /14 and /16 participate in the formation of the 
dimer interface. Strand /14 is observed in molecules C and D 
and strand /16 is observed in molecules A-D. The character¬ 
istic feature of the structure is the 21-residue-long helix 
cy 4. Analysis using EBI web tools (PDBsum/ProFunc, 
Catalytic site search; http://www.ebi.ac.uk), DALI (http:// 
ekhidna.biocenter.helsinki.fi/dali_server/) and GRA TH (http:// 
protein.hbu.cn/cath/cathwww.biochem.ucl.ac.uk/cgi-bin/cath/ 
Grath.html) for both the monomer and dimer did not result in 
any significant indicators of function or similarities in struc¬ 
ture. The structure therefore represents a novel protein fold. 
Nsp4ct has nine conserved and two nonconserved hydro- 
phobic residues (Fig. 2) which form the hydrophobic core of 
the structure. These residues are grouped into a mainly 
aliphatic group (Phel9, Ile21, Leu29, Ile38, Leu68, Leu72) and 
two aromatic groups (Phell, Tyr41, Tyr65 and Tyr26, Tyr75) 
(Fig. 3). 

The r.m.s.d. between C“ atoms of molecules A-E is less than 
0.1 A (for the 63 common C“ atoms). Differences between 
molecules are present in the N-terminus (residues 0-4), the 
C-terminus (from residue 88 onwards) and the region between 
residues 46 and 64, which includes the C-terminal part of helix 
a3, the flexible loop L o3 _ o4 and the N-terminus of helix a4. 
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Figure 2 

Alignment of amino-acid sequences from nsp4ct proteins, coupled with secondary-structure information from the FCoV nsp4ct three-dimensional 
structure. The alignment is based on amino-acid data for feline infectious peritonitis virus (FCoV, NC_007025.1), human coronavirus NL63 (FICoV, 
ABE97129), murine hepatitis virus (MF1V, NP_001012459.1), severe acute respiratory syndrome coronavirus (SARS_CoV, NP_904322.1) and infectious 
bronchitis virus strain Beaudette (IBV, NP_740625). The alignment was produced with ClustalWl (Larkin et al., 2007) and edited with JalView. Residues 
are coloured according to conservation from fully conserved (dark blue) to nonconserved (colourless). 


3.3. Dimer interface 

The FCoV nsp4ct crystal contains five molecules in the 
asymmetric unit. Molecules AIC and BID form very similar 
dimers (Fig. 4a), each of approximate dimensions 60 x 20 x 
20 A. The average buried surface area per molecule is 
approximately 961 A 2 . The buried surface of each dimer 
involves approximately 25 residues from al, a3, a4, loop 
L a3 _ a4 an d the C-terminus. Interestingly, the interaction 
interface contains an intramolecular three-stranded anti¬ 
parallel /6-sheet. The order of the strands in this sheet is /34 c 



C 

Figure 3 

A ribbon view of FCoV nsp4ct is shown with the side chains of the 
hydrophobic residues important for protein folding depicted as van der 
Waals spheres. The residues are divided into three groups, namely the 
mainly aliphatic group (Phel9, Ile21, Leu29, Ile38, Leu68 and Leu72, 
yellow), aromatic group 1 (Phcll, Tyr41 and Tyr65, red) and aromatic 
group 2 (Tyr26 and Tyr75, green). 


/86 a -/66 c in the case of dimer AIC and /64 D -/66 B -/86 D in the case 
of dimer BID. Strands /64 c and /64 0 include residues 51-53, 
strands /36 A and /36 B include residues 89-92 and strands /36 c 
and /86 D include residues 88-90. The major interactions at this 
interface are the /6-sheet hydrogen bonds Val89 N- ■ Gly53 O, 
Val89 O- ■ Gly53 N, Ser90 N- ■ -Ser90 O, Ser90 O- ■ -Ser90 N, 
Val91 N- ■ -Tyr51 O, Asn92 N- ■ -Tyr88 O and Asn92 O- ■ -Tyr88 N. 
Five hydrogen bonds located outside the /6-sheet region are 
formed: Met55 O- ■ Tyr60 O", Tyr60 O"- • Met55 O, Tyr60 O"- • • 
Met55 N, Thr94 O v - ■ -Thr85 O and Thr94 O r ■ ■ -Thr85 O r 
(Fig. 4b). Strong van der Waals contacts between residues in 
the dimer buried area are also important in defining the 
interface. 

The results obtained from analytical size-exclusion chro¬ 
matography of nsp4ct demonstrated that the protein is 
monomeric in solution under the experimental conditions 
used (results not shown). Furthermore, the crystal structure 
contains a monomer as well as two dimers. The buried surface 
area supports the likelihood of dimerization and this may have 
physiological significance. It is conceivable that in vivo nsp4 
dimerization during membrane modification or formation of 
the RTC may help in bringing the components together and 
could therefore aid their correct spatial orientation. This 
would agree with the previously proposed role of nsp4 as an 
anchor for the assembly of the viral RTC. 

3.4. Nsp4ct sequence alignment 

Fig. 2 shows the sequence alignment, produced with 
CIustaIW2 (Larkin et al., 2007), of the C-terminal domain of 
nsp4 for the five coronaviral subgroups. These viruses are 
FCoV (group la), human coronavirus NL63 (HCoV-NL63; 
group lb), murine hepatitis virus (MHV; group 2a), SARS- 
CoV (group 2b) and infectious bronchitis virus (IBV; group 3). 
The nsp4ct sequence identity between viruses belonging to the 
same group (but different subgroups) is higher than that for 
viruses belonging to different groups. The sequence identity 
between FCoV and HCoV-NL63 is 68% and that between 
MHV and SARS-CoV is 53%. IBV, on the other hand, is the 
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most distantly related virus and its sequence identity in all 
possible combinations with the other viruses is around 35%. 
This is consistent with previously published phylogenetic 
analyses of coronaviruses (Gorbalenya et al ., 2004). The 
sequence alignment shows a high level of conservation of the 
nsp4ct domain, with 17 of around 100 residues being identical 
between all five subgroups. Most of the aromatic amino acids 
of coronavirus nsp4ct are highly conserved. This includes 
residues Phel9, Tyr41, Tyr50, Tyr60 and Tyr84, which are fully 
conserved, and Phell (Tyr in IBV), Tyr26 (Phe in IBV), Phe45 
(Tyr in four other coronaviruses) and Tyr51 and Tyr75 (both 
Phe in MHV and SARS-CoV). Interestingly, Phe45, Tyr50, 
Tyr51, Tyr60 and Tyr84 are part of the FCoV nsp4ct dimer 
interface and the fully conserved Tyr60 forms a side-chain 
(O n ) hydrogen-bond interaction with the main-chain carbonyl 
of Met55 from the second monomer. The two fully conserved 
C-terminal residues (Leu95 and Gln96) are part of the 


recognition site for the coronavirus M pro (Hegyi & Ziebuhr, 

2002 ). 

There are four clusters of highly conserved residues. The 
first is between residues 9 and 19 and includes residues in helix 
a 1 and strand /J3. The second comprises residues 45-53 that 
belong to helix a 3 and part of loop L„ 3 _ (y4 . Interestingly, the 
five independent chains of the FCoV nsp4ct structure differ 
most profoundly in this region, suggesting that it is highly 
flexible. In the cases of molecules B and E it was disordered 
and there was no electron density visible for residues 50-52 
and 50-55, respectively. In molecules A , C and D this region 
could be placed into electron density and is involved in dimer 
formation. Fligh sequence conservation of this cluster and its 
structural flexibility suggests that it may play an important role 
in the nsp4ct domain function. Residues 60-71 that belong to 
helix a4 form the third highly conserved cluster and the fourth 
cluster consists of the C-terminal residues 81-96. This last 
cluster contains the highly conserved Tyr84, 
Pro86 and Pro87 which form the YxPP 




Tyr51C 


Gly53C 


Met55C 


\Val91A 


Val89A 


Tyr60A 


|Thr88C, 


Tyr60C 


Met55A 


Thr85C 


Figure 4 

(a) The dimer interface between molecules A and C is shown in cartoon representation as a 
stereo pair. Molecule A is shown in green and molecule C is shown in yellow. The surfaces of 
the monomers at the interface are shown in mesh representation, (6) Hydrogen bonds 
at the dimer interface Val89 N- ■ -Gly53 O, Val89 O-■-Gly53 N, Ser90 N- ■ -Ser90 O, 
Ser90 O- ■ -Ser90 N, Val91 N- ■ -Tyr51 O, Asn92 N- ■ -Tyr88 O, Asn92 O- ■ -Tyr88 N, 
Met55 O- ■ -Tyr60 O”, Tyr60 O"- ■ -Met55 O, Tyr60 O”- ■ -Met55 N, Thr94 O''- ■ -Thr85 O and 
Thr94 O y - ■ -Thr85 O r are shown as dashed lines. Molecule A is shown in green, molecule C is 
shown in yellow, O atoms are shown in red and N atoms are shown in blue. For clarity, the 
arrows (cartoon representation) of the strands forming the dimer interface are not shown in 
this panel. 


motif, which is the inverse of the consensus 
PPxY sequence recognized by the class I 
WW domains (Linn et al., 1997). Di Leva et 
al. (2006) showed that the class I WW 
domain does not require a peptide with a 
consensus sequence and can also bind an 
inverted peptide sequence. The only condi¬ 
tion is the presence of the polyproline II 
(PPII) conformation, which is observed in 
the case of FCoV nsp4ct. This suggests that 
region 84-87 is a reasonable candidate for 
protein-protein interactions. PRO SITE 
(http://www.expasy.org/prosite/) analysis of 
all FCoV nsps did not identify any possible 
WW domains, suggesting that the YxPP 
motif interacting partner is a host protein. 
Furthermore, localization of the Pro-Pro 
motif may protect the extended unstruc¬ 
tured C-terminus from proteolytic cleavage 
by host enzymes (Vanhoof et al, 1995). 


4. Conclusions 

The high conservation of the C-terminal 
domain of nsp4 suggests not only that it 
plays a ubiquitous role in the coronavirus 
life cycle, but also that nsp4 proteins from 
different subgroups are structurally similar 
and have similar modes of operation. In this 
context, it is a surprising finding that dele¬ 
tion of the nsp4ct of MHV (using a reverse 
genetics system) was reported to be toler¬ 
ated by the virus (Clementz et al., 2008; 
Sparks et al., 2007), with the resulting 
mutant displaying a modestly attenuated 
phenotype. Thus, although a similar mutant 
has not been generated for FCoV or any 
other coronavirus, the conservation of the 
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nsp4ct domain outlined above would suggest that it is not 
absolutely required for coronavirus RNA synthesis and/or 
RTC formation per se. This opens the possibility that, like 
some other recently characterized coronavirus enzyme func¬ 
tions (Eriksson et al. , 2008; Roth-Cross et al., 2009), the nsp4ct 
domain might play a role in specific virus-host interactions of 
the type that are not easily uncovered in cell culture-based 
systems for virus propagation. Further functional studies are 
required in order to better understand the detailed role of 
nsp4 and to identify its partners and therefore its significance 
in the viral life cycle. 
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